64 research outputs found
Normal Factor Graphs and Holographic Transformations
This paper stands at the intersection of two distinct lines of research. One
line is "holographic algorithms," a powerful approach introduced by Valiant for
solving various counting problems in computer science; the other is "normal
factor graphs," an elegant framework proposed by Forney for representing codes
defined on graphs. We introduce the notion of holographic transformations for
normal factor graphs, and establish a very general theorem, called the
generalized Holant theorem, which relates a normal factor graph to its
holographic transformation. We show that the generalized Holant theorem on the
one hand underlies the principle of holographic algorithms, and on the other
hand reduces to a general duality theorem for normal factor graphs, a special
case of which was first proved by Forney. In the course of our development, we
formalize a new semantics for normal factor graphs, which highlights various
linear algebraic properties that potentially enable the use of normal factor
graphs as a linear algebraic tool.Comment: To appear IEEE Trans. Inform. Theor
MixUp as Locally Linear Out-Of-Manifold Regularization
MixUp is a recently proposed data-augmentation scheme, which linearly
interpolates a random pair of training examples and correspondingly the one-hot
representations of their labels. Training deep neural networks with such
additional data is shown capable of significantly improving the predictive
accuracy of the current art. The power of MixUp, however, is primarily
established empirically and its working and effectiveness have not been
explained in any depth. In this paper, we develop an understanding for MixUp as
a form of "out-of-manifold regularization", which imposes certain "local
linearity" constraints on the model's input space beyond the data manifold.
This analysis enables us to identify a limitation of MixUp, which we call
"manifold intrusion". In a nutshell, manifold intrusion in MixUp is a form of
under-fitting resulting from conflicts between the synthetic labels of the
mixed-up examples and the labels of original training data. Such a phenomenon
usually happens when the parameters controlling the generation of mixing
policies are not sufficiently fine-tuned on the training data. To address this
issue, we propose a novel adaptive version of MixUp, where the mixing policies
are automatically learned from the data using an additional network and
objective function designed to avoid manifold intrusion. The proposed
regularizer, AdaMixUp, is empirically evaluated on several benchmark datasets.
Extensive experiments demonstrate that AdaMixUp improves upon MixUp when
applied to the current art of deep classification models.Comment: Accepted by AAAI201
Tighter Information-Theoretic Generalization Bounds from Supersamples
In this work, we present a variety of novel information-theoretic
generalization bounds for learning algorithms, from the supersample setting of
Steinke & Zakynthinou (2020)-the setting of the "conditional mutual
information" framework. Our development exploits projecting the loss pair
(obtained from a training instance and a testing instance) down to a single
number and correlating loss values with a Rademacher sequence (and its shifted
variants). The presented bounds include square-root bounds, fast-rate bounds,
including those based on variance and sharpness, and bounds for interpolating
algorithms etc. We show theoretically or empirically that these bounds are
tighter than all information-theoretic bounds known to date on the same
supersample setting.Comment: Accepted to ICML 202
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Stochastic differential equations (SDEs) have been shown recently to well
characterize the dynamics of training machine learning models with SGD. This
provides two opportunities for better understanding the generalization
behaviour of SGD through its SDE approximation. First, under the SDE
characterization, SGD may be regarded as the full-batch gradient descent with
Gaussian gradient noise. This allows the application of the generalization
bounds developed by Xu & Raginsky (2017) to analyzing the generalization
behaviour of SGD, resulting in upper bounds in terms of the mutual information
between the training set and the training trajectory. Second, under mild
assumptions, it is possible to obtain an estimate of the steady-state weight
distribution of SDE. Using this estimate, we apply the PAC-Bayes-like
information-theoretic bounds developed in both Xu & Raginsky (2017) and Negrea
et al. (2019) to obtain generalization upper bounds in terms of the KL
divergence between the steady-state weight distribution of SGD with respect to
a prior distribution. Among various options, one may choose the prior as the
steady-state weight distribution obtained by SGD on the same training set but
with one example held out. In this case, the bound can be elegantly expressed
using the influence function (Koh & Liang, 2017), which suggests that the
generalization of the SGD is related to the stability of SGD. Various insights
are presented along the development of these bounds, which are subsequently
validated numerically
Information-Theoretic Analysis of Unsupervised Domain Adaptation
This paper uses information-theoretic tools to analyze the generalization
error in unsupervised domain adaptation (UDA). We present novel upper bounds
for two notions of generalization errors. The first notion measures the gap
between the population risk in the target domain and that in the source domain,
and the second measures the gap between the population risk in the target
domain and the empirical risk in the source domain. While our bounds for the
first kind of error are in line with the traditional analysis and give similar
insights, our bounds on the second kind of error are algorithm-dependent, which
also provide insights into algorithm designs. Specifically, we present two
simple techniques for improving generalization in UDA and validate them
experimentally
- …